Decision fusion by boosting method for multi-modal voice activity detection
نویسندگان
چکیده
In this paper, we propose a multi-modal voice activity detection system (VAD) that uses audio and visual information. In multi-modal (speech) signal processing, there are two methods for fusing the audio and the visual information: concatenating the audio and visual features, and employing audioonly and visual-only classifiers, then fusing the unimodal decisions. We investigate the effectiveness of decision fusion given by the results from AdaBoost. AdaBoost is one of the machine learning method. By using AdaBoost, the effective classifier is constructed by combining weak classifiers. It classifies input data into two classes based on the weighted results from weak classifiers. In proposed method, this fusion scheme is applied to decision fusion of multi-modal VAD. Experimental results show proposed method to generally be more effective.
منابع مشابه
Voice activity detection based on fusion of audio and visual information
In this paper, we propose a multi-modal voice activity detection system (VAD) that uses audio and visual information. Audioonly VAD systems typically are not robust to (acoustic) noise. Incorporating visual information, for example information extracted from mouth images, can improve the robustness since the visual information is not affected by the acoustic noise. In multi-modal (speech) signa...
متن کاملAn Augmented Multi-tiered Classifier for Instantaneous Multi-modal Voice Activity Detection
As mobile devices, intelligent displays, and home entertainment systems permeate digital markets, the desire for users to interact through spoken and visual modalities similarly grows. Previous interactive systems limit voice activity detection (VAD) to the acoustic domain alone, but the incorporation of visual features has shown great improvement in performance accuracy. When employing both ac...
متن کاملMultimedia Evidence Fusion for Video Concept Detection via OWA Operator
We present a novel multi-modal evidence fusion method for highlevel feature (HLF) detection in videos. The uni-modal features, such as color histogram, transcript texts, etc, tend to capture different aspects of HLFs and hence share complementariness and redundancy in modeling the contents of such HLFs. We argue that such inter-relation are key to effective multi-modal fusion. Here, we formulat...
متن کاملDamage detection of multi-girder bridge superstructure based on the modal strain approaches
The research described in this paper focuses on the application of modal strain techniques on a multi-girder bridge superstructure with the objectives of identifying the presence of damage and detecting false damage diagnosis for such structures. The case study is a one-third scale model of a slab-on-girder composite bridge superstructure, comprised of a steel-free concrete deck with FRP rebars...
متن کاملOn the Improvements of - Uni-modal and Bi-modal Fusions of Speaker and Face Recognition for Mobile Biometrics
The MOBIO database provides a challenging test-bed for speaker and face recognition systems because it includes voice and face samples as they would appear in forensic scenarios. In this paper, we investigate uni-modal and bimodal multi-algorithm fusion using logistic regression. The source speaker and face recognition systems were taken from the 2013 speaker and face recognition evaluations th...
متن کامل